9 research outputs found

    Vision based localization: from humanoid robots to visually impaired people

    Get PDF
    Nowadays, 3D applications have recently become a more and more popular topic in robotics, computer vision or augmented reality. By means of cameras and computer vision techniques, it is possible to obtain accurate 3D models of large-scale environments such as cities. In addition, cameras are low-cost, non-intrusive sensors compared to other sensors such as laser scanners. Furthermore, cameras also offer a rich information about the environment. One application of great interest is the vision-based localization in a prior 3D map. Robots need to perform tasks in the environment autonomously, and for this purpose, is very important to know precisely the location of the robot in the map. In the same way, providing accurate information about the location and spatial orientation of the user in a large-scale environment can be of benefit for those who suffer from visual impairment problems. A safe and autonomous navigation in unknown or known environments, can be a great challenge for those who are blind or are visually impaired. Most of the commercial solutions for visually impaired localization and navigation assistance are based on the satellite Global Positioning System (GPS). However, these solutions are not suitable enough for the visually impaired community in urban-environments. The errors are about of the order of several meters and there are also other problems such GPS signal loss or line-of-sight restrictions. In addition, GPS does not work if an insufficient number of satellites are directly visible. Therefore, GPS cannot be used for indoor environments. Thus, it is important to do further research on new more robust and accurate localization systems. In this thesis we propose several algorithms in order to obtain an accurate real-time vision-based localization from a prior 3D map. For that purpose, it is necessary to compute a 3D map of the environment beforehand. For computing that 3D map, we employ well-known techniques such as Simultaneous Localization and Mapping (SLAM) or Structure from Motion (SfM). In this thesis, we implement a visual SLAM system using a stereo camera as the only sensor that allows to obtain accurate 3D reconstructions of the environment. The proposed SLAM system is also capable to detect moving objects especially in a close range to the camera up to approximately 5 meters, thanks to a moving objects detection module. This is possible, thanks to a dense scene flow representation of the environment, that allows to obtain the 3D motion of the world points. This moving objects detection module seems to be very effective in highly crowded and dynamic environments, where there are a huge number of dynamic objects such as pedestrians. By means of the moving objects detection module we avoid adding erroneous 3D points into the SLAM process, yielding much better and consistent 3D reconstruction results. Up to the best of our knowledge, this is the first time that dense scene flow and derived detection of moving objects has been applied in the context of visual SLAM for challenging crowded and dynamic environments, such as the ones presented in this Thesis. In SLAM and vision-based localization approaches, 3D map points are usually described by means of appearance descriptors. By means of these appearance descriptors, the data association between 3D map elements and perceived 2D image features can be done. In this thesis we have investigated a novel family of appearance descriptors known as Gauge-Speeded Up Robust Features (G-SURF). Those descriptors are based on the use of gauge coordinates. By means of these coordinates every pixel in the image is fixed separately in its own local coordinate frame defined by the local structure itself and consisting of the gradient vector and its perpendicular direction. We have carried out an extensive experimental evaluation on different applications such as image matching, visual object categorization and 3D SfM applications that show the usefulness and improved results of G-SURF descriptors against other state-of-the-art descriptors such as the Scale Invariant Feature Transform (SIFT) or SURF. In vision-based localization applications, one of the most expensive computational steps is the data association between a large map of 3D points and perceived 2D features in the image. Traditional approaches often rely on purely appearence information for solving the data association step. These algorithms can have a high computational demand and for environments with highly repetitive textures, such as cities, this data association can lead to erroneous results due to the ambiguities introduced by visually similar features. In this thesis we have done an algorithm for predicting the visibility of 3D points by means of a memory based learning approach from a prior 3D reconstruction. Thanks to this learning approach, we can speed-up the data association step by means of the prediction of visible 3D points given a prior camera pose. We have implemented and evaluated visual SLAM and vision-based localization algorithms for two different applications of great interest: humanoid robots and visually impaired people. Regarding humanoid robots, a monocular vision-based localization algorithm with visibility prediction has been evaluated under different scenarios and different types of sequences such as square trajectories, circular, with moving objects, changes in lighting, etc. A comparison of the localization and mapping error has been done with respect to a precise motion capture system, yielding errors about the order of few cm. Furthermore, we also compared our vision-based localization system with respect to the Parallel Tracking and Mapping (PTAM) approach, obtaining much better results with our localization algorithm. With respect to the vision-based localization approach for the visually impaired, we have evaluated the vision-based localization system in indoor and cluttered office-like environments. In addition, we have evaluated the visual SLAM algorithm with moving objects detection considering test with real visually impaired users in very dynamic environments such as inside the Atocha railway station (Madrid, Spain) and in the city center of Alcalá de Henares (Madrid, Spain). The obtained results highlight the potential benefits of our approach for the localization of the visually impaired in large and cluttered environments

    Face pose estimation with automatic 3D model creation in challenging scenarios

    Get PDF
    This paper proposes a new method to perform real-time face pose estimation for ± 90° yaw rotations and under low light conditions. The algorithm works on the basis of a completely automatic and run-time incremental 3D face modelling. The model is initially made up upon a set of 3D points derived from stereo grey-scale images. As new areas of the subject face appear to the cameras, new 3D points are automatically added to complete the model. In this way, we can estimate the pose for a wide range of rotation angles, where typically 3D frontal points are occluded. We propose a new feature re-registering technique which combines views of both cameras of the stereo rig in a smart way in order to perform a fast and robust tracking for the full range of yaw rotations. The Levenberg–Marquardt algorithm is used to recover the pose and a RANSAC framework rejects incorrectly tracked points. The model is continuously optimised in a bundle adjustment process that reduces the accumulated error on the 3D reconstruction. The intended application of this work is estimating the focus of attention of drivers in a simulator, which imposes challenging requirements. We validate our method on sequences recorded in a naturalistic truck simulator, on driving exercises designed by a team of psychologists.Ministerio de Ciencia e Innovació

    Are you ABLE to perform a life-long visual topological localization?

    Get PDF
    Visual topological localization is a process typically required by varied mobile autonomous robots, but it is a complex task if long operating periods are considered. This is because of the appearance variations suffered in a place: dynamic elements, illumination or weather. Due to these problems, long-term visual place recognition across seasons has become a challenge for the robotics community. For this reason, we propose an innovative method for a robust and efficient life-long localization using cameras. In this paper, we describe our approach (ABLE), which includes three different versions depending on the type of images: monocular, stereo and panoramic. This distinction makes our proposal more adaptable and effective, because it allows to exploit the extra information that can be provided by each type of camera. Besides, we contribute a novel methodology for identifying places, which is based on a fast matching of global binary descriptors extracted from sequences of images. The presented results demonstrate the benefits of using ABLE, which is compared to the most representative state-of-the-art algorithms in long-term conditions.Ministerio de Economía y CompetitividadComunidad de Madri

    Gauge-SURF descriptors

    Get PDF
    In this paper, we present a novel family of multiscale local feature descriptors, a theoretically and intuitively well justified variant of SURF which is straightforward to implement but which nevertheless is capable of demonstrably better performance with comparable computational cost. Our family of descriptors, called Gauge-SURF (G-SURF), is based on second-order multiscale gauge derivatives. While the standard derivatives used to build a SURF descriptor are all relative to a single chosen orientation, gauge derivatives are evaluated relative to the gradient direction at every pixel. Like standard SURF descriptors, G-SURF descriptors are fast to compute due to the use of integral images, but have extra matching robustness due to the extra invariance offered by gauge derivatives. We present extensive experimental image matching results on the Mikolajczyk and Schmid dataset which show the clear advantages of our family of descriptors against first-order local derivatives based descriptors such as: SURF, Modified-SURF (M-SURF) and SIFT, in both standard and upright forms. In addition, we also show experimental results on large-scale 3D Structure from Motion (SfM) and visual categorization applications

    Reliable Indoor Navigation on Humanoid Robots using Vision-Based Localization

    Get PDF
    Reliable localization on a humanoid robot is a sine qua non condition to succeed in realizing complex robotics scenarios. Before dealing with perturbations and online modification of the environment, one has to make sure that the planned trajectory alone will be correctly followed. This paper demonstrates that a control framework suited to humanoid robots relying on a vision-based localization system can achieve this goal. Our localization framework is based on a real-time vision-based localization system that assumes that a pre-existing 3D map of the environment exists and allows to obtain accurate results in complex robotics scenarios. By compensating for execution errors such as drifts and robot model errors, the HRP-2 robot is able to achieve high precision tasks

    How to localize humanoids with a single camera?

    Get PDF
    In this paper, we propose a real-time visionbased localization approach for humanoid robots using a single camera as the only sensor. In order to obtain an accurate localization of the robot, we first build an accurate 3D map of the environment. In the map computation process, we use stereo visual SLAM techniques based on non-linear least squares optimization methods (bundle adjustment). Once we have computed a 3D reconstruction of the environment, which comprises of a set of camera poses (keyframes) and a list of 3D points, we learn the visibility of the 3D points by exploiting all the geometric relationships between the camera poses and 3D map points involved in the reconstruction. Finally, we use the prior 3D map and the learned visibility prediction for monocular vision-based localization. Our algorithm is very efficient, easy to implement and more robust and accurate than existing approaches. By means of visibility prediction we predict for a query pose only the highly visible 3D points,thus, speeding up tremendously the data association between 3D map points and perceived 2D features in the image. In this way, we can solve very efficiently the Perspective-n-Point (PnP) problem providing robust and fast vision-based localization. We demonstrate the robustness and accuracy of our approach by showing several vision-based localization experiments with the HRP-2 humanoid robot.Ministerio de Economía y CompetitividadComunidad de Madri

    Automatic traffic signs and panels inspection system using computer vision

    No full text
    Computer vision techniques applied to systems used on road maintenance, which are related either to traffic signs or to the road itself, are playing a major role in many countries because of the higher investment on public works of this kind. These systems are able to collect a wide range of information automatically and quickly, with the aim of improving road safety. In this context, the correct visibility of traffic signs and panels is vital for the safety of drivers. This paper describes an approach to the VISUAL Inspection of Signs and panEls (“VISUALISE”), which is an automatic inspection system, mounted onboard a vehicle, which performs inspection tasks at conventional driving speeds. VISUALISE allows for an improvement in the awareness of the road signaling state, supporting planning and decision making on the administration's and infrastructure operators' side. A description of the main computer vision techniques and some experimental results obtained from thousands of kilometers are presented. Finally, the conclusions of the system are described
    corecore